171 research outputs found

    Identification de facteurs génétiques impliqués dans les troubles du spectre autistique et de la dyslexie

    Get PDF
    Les troubles du spectre autistique (TSA) touchent approximativement 1% de la population gĂ©nĂ©rale. Ces troubles se caractĂ©risent par un dĂ©ficit de la communication sociale, ainsi que des comportements stĂ©rĂ©otypĂ©s et des intĂ©rĂȘts restreints. Plusieurs gĂšnes impliquĂ©s dans le dĂ©terminisme des TSA ont Ă©tĂ© identifiĂ©s, comme par exemple les gĂšnes NLGN3-4X, NRXN1-3 et SHANK1-3. Au cours des annĂ©es prĂ©cĂ©dentes, les TSA ont Ă©tĂ© considĂ©rĂ©s comme un ensemble complexe de troubles monogĂ©niques. Cependant, les Ă©tudes rĂ©centes du gĂ©nome complet suggĂšrent la prĂ©sence de gĂšnes modificateurs ( multiple hits model ). La dyslexie est caractĂ©risĂ©e par un trouble dans l apprentissage de la lecture et de l Ă©criture qui touche 5- 15% de la population gĂ©nĂ©rale. Les facteurs gĂ©nĂ©tiques impliquĂ©s restent pour l instant inconnus car seuls des gĂšnes ou loci candidats ont Ă©tĂ© identifiĂ©s. Mon projet de thĂšse avait pour objectif de poursuivre l identification des facteurs gĂ©nĂ©tiques impliquĂ©s dans les TSA et de dĂ©couvrir un premier facteur gĂ©nĂ©tique pour la dyslexie. Pour cela, deux types de populations ont Ă©tĂ© Ă©tudiĂ©s : d une part des patients atteints de TSA (N>600) provenant de France, de SuĂšde et des Iles Faroe, d autre part des patients atteints de dyslexie (N>200) provenant de France, en particulier une famille de 11 personnes atteintes sur 3 gĂ©nĂ©rations. J ai utilisĂ© Ă  la fois la technologie des puces Ă  ADN Illumina (600 K et 5M) et le sĂ©quençage complet du gĂ©nome humain pour effectuer des analyses de liaison et d association. Pour les TSA, grĂące aux analyses de CNVs, j ai pu identifier des gĂšnes candidats pour l autisme et confirmer l association de plusieurs gĂšnes synaptiques avec l autisme. En particulier, l Ă©tude d une population de 30 patients des Ăźles Faroe a pu confirmer l implication des gĂšnes NLGN1 et NRXN1 dans l autisme et identifier un nouveau gĂšne candidat IQSEC3. En parallĂšle, j ai explorĂ©PRRT2 localisĂ© en 16p11.2. PRRT2 code pour un membre du complexe SNARE synaptique qui permet la libĂ©ration des vĂ©sicules synaptiques. Je n ai pas pu mettre en Ă©vidence d association avec les TSA, mais j ai montrĂ© que ce gĂšne important pour certaines maladies neurologiques Ă©tait sous pression de sĂ©lection diffĂ©rente selon les populations. Pour la dyslexie, j ai effectuĂ© une analyse de liaison (mĂ©thode des lod-scores) pour une grande famille de 11 individus atteints sur trois gĂ©nĂ©rations. Cette Ă©tude a permis d identifier CNTNAP2 comme un gĂšne de vulnĂ©rabilitĂ© Ă  la dyslexie. Cette dĂ©couverte est importante car ce mĂȘme gĂšne est aussi associĂ© aux TSA. Par contre, aucune des 20 variations rares dĂ©couvertes par le sĂ©quençage complet du gĂ©nome n est localisĂ©e dans les parties codantes du gĂšne. Plusieurs variations localisĂ©es dans des rĂ©gions rĂ©gulatrices sont candidates. En conclusion, les rĂ©sultats de ma thĂšse ont permis d identifier des gĂšnes candidats pour les TSA, de confirmer le rĂŽle des gĂšnes synaptiques dans ce trouble, de montrer pour la premiĂšre fois grĂące Ă  une analyse de liaison le rĂŽle de CNTNAP2 dans la dyslexie.Autism spectrum disorders (ASD) affect 1% of the general population. These disorders are characterized by deficits in social communication as well as stereotyped behaviors and restricted interests. Several genes involved in the determination of ASD have been identified, such as NLGN3-4, NRXN1-3 and SHANK1-3. In the previous years, ASD have been considered as a complex set of monogenic disorders. Recent studies on the complete genome nevertheless suggest the presence of modifier genes ("multiple hits model"). Dyslexia is characterized by difficulties in learning to read and write. It affects 5-15 % of the general population. Genetic factors involved remain unknown. Only candidate genes or loci have been identified. My thesis had two main objectives: pursuing the identification of genetic factors involved in ASD, and discovering a first genetic factor for dyslexia. I therefore studied two types of populations: on the one hand a group of patients with ASD (N > 600) from France, Sweden and the Faroe Islands, and on the other hand another group of patients with dyslexia (N > 200) from France, and more specifically a family of 11 people followed over 3 generations. I used both Illumina microarrays technology (600K and 5M) and the complete human genome sequencing to conduct linkage and association analyses. Regarding ASD, CNVs (copy number variants) analyses allowed me to confirm the association of several synaptic genes with autism and to identify new candidate genes. In particular, the study of a population of 30 patients from the Faroe Islands confirmed the involvement of NLGN1 and NRXN1 genes in autism and identified a new candidate gene, IQSEC3. At the same time, I explored PRRT2 located in 16p11.2. PRRT2 encodes a member of the synaptic SNARE complex that allows the release of synaptic vesicles. I have not been able to demonstrate any association with ASD, but I showed that this gene, which is important for some neurological diseases, was under different selection pressures according to the population considered. Regarding dyslexia, I realized a linkage analysis (lod-score method) for a large family of 11 individuals, with three generations affected. This study identified the CNTNAP2 gene as a vulnerability factor for dyslexia. This finding is important because this gene is also associated with ASD. Nevertheless, none of the 20 rare variations discovered by whole genome sequencing is localized in the coding parts of the gene. Only several variations localized in regulatory regions are robust candidates. To conclude, my findings enabled the identification of new candidate genes for ASD, the confirmation of the role of synaptic genes in this disorder, and the highlight for the first time of the role of CNTNAP2 in dyslexia through linkage analysis.PARIS5-Bibliotheque electronique (751069902) / SudocSudocFranceF

    Geodesic Sinkhorn: optimal transport for high-dimensional datasets

    Full text link
    Understanding the dynamics and reactions of cells from population snapshots is a major challenge in single-cell transcriptomics. Here, we present Geodesic Sinkhorn, a method for interpolating populations along a data manifold that leverages existing kernels developed for single-cell dimensionality reduction and visualization methods. Our Geodesic Sinkhorn method uses a heat-geodesic ground distance that, as compared to Euclidean ground distances, is more accurate for interpolating single-cell dynamics on a wide variety of datasets and significantly speeds up the computation for sparse kernels. We first apply Geodesic Sinkhorn to 10 single-cell transcriptomics time series interpolation datasets as a drop-in replacement for existing interpolation methods where it outperforms on all datasets, showing its effectiveness in modeling cell dynamics. Second, we show how to efficiently approximate the operator with polynomial kernels allowing us to improve scaling to large datasets. Finally, we define the conditional Wasserstein-average treatment effect and show how it can elucidate the treatment effect on single-cell populations on a drug screen.Comment: 15 pages, 5 tables, 5 figures, submitted to RECOMB 202

    A Heat Diffusion Perspective on Geodesic Preserving Dimensionality Reduction

    Full text link
    Diffusion-based manifold learning methods have proven useful in representation learning and dimensionality reduction of modern high dimensional, high throughput, noisy datasets. Such datasets are especially present in fields like biology and physics. While it is thought that these methods preserve underlying manifold structure of data by learning a proxy for geodesic distances, no specific theoretical links have been established. Here, we establish such a link via results in Riemannian geometry explicitly connecting heat diffusion to manifold distances. In this process, we also formulate a more general heat kernel based manifold embedding method that we call heat geodesic embeddings. This novel perspective makes clearer the choices available in manifold learning and denoising. Results show that our method outperforms existing state of the art in preserving ground truth manifold distances, and preserving cluster structure in toy datasets. We also showcase our method on single cell RNA-sequencing datasets with both continuum and cluster structure, where our method enables interpolation of withheld timepoints of data. Finally, we show that parameters of our more general method can be configured to give results similar to PHATE (a state-of-the-art diffusion based manifold learning method) as well as SNE (an attraction/repulsion neighborhood based method that forms the basis of t-SNE).Comment: 31 pages, 13 figures, 10 table

    Manifold Interpolating Optimal-Transport Flows for Trajectory Inference

    Full text link
    We present a method called Manifold Interpolating Optimal-Transport Flow (MIOFlow) that learns stochastic, continuous population dynamics from static snapshot samples taken at sporadic timepoints. MIOFlow combines dynamic models, manifold learning, and optimal transport by training neural ordinary differential equations (Neural ODE) to interpolate between static population snapshots as penalized by optimal transport with manifold ground distance. Further, we ensure that the flow follows the geometry by operating in the latent space of an autoencoder that we call a geodesic autoencoder (GAE). In GAE the latent space distance between points is regularized to match a novel multiscale geodesic distance on the data manifold that we define. We show that this method is superior to normalizing flows, Schr\"odinger bridges and other generative models that are designed to flow from noise to data in terms of interpolating between populations. Theoretically, we link these trajectories with dynamic optimal transport. We evaluate our method on simulated data with bifurcations and merges, as well as scRNA-seq data from embryoid body differentiation, and acute myeloid leukemia treatment.Comment: Presented at NeurIPS 2022, 24 pages, 7 tables, 14 figure

    Simulation-free Schr\"odinger bridges via score and flow matching

    Full text link
    We present simulation-free score and flow matching ([SF]2^2M), a simulation-free objective for inferring stochastic dynamics given unpaired source and target samples drawn from arbitrary distributions. Our method generalizes both the score-matching loss used in the training of diffusion models and the recently proposed flow matching loss used in the training of continuous normalizing flows. [SF]2^2M interprets continuous-time stochastic generative modeling as a Schr\"odinger bridge (SB) problem. It relies on static entropy-regularized optimal transport, or a minibatch approximation, to efficiently learn the SB without simulating the learned stochastic process. We find that [SF]2^2M is more efficient and gives more accurate solutions to the SB problem than simulation-based methods from prior work. Finally, we apply [SF]2^2M to the problem of learning cell dynamics from snapshot data. Notably, [SF]2^2M is the first method to accurately model cell dynamics in high dimensions and can recover known gene regulatory networks from simulated data.Comment: A version of this paper appeared in the New Frontiers in Learning, Control, and Dynamical Systems workshop at ICML 2023. Code: https://github.com/atong01/conditional-flow-matchin

    Improving and generalizing flow-based generative models with minibatch optimal transport

    Full text link
    Continuous normalizing flows (CNFs) are an attractive generative modeling technique, but they have been held back by limitations in their simulation-based maximum likelihood training. We introduce the generalized conditional flow matching (CFM) technique, a family of simulation-free training objectives for CNFs. CFM features a stable regression objective like that used to train the stochastic flow in diffusion models but enjoys the efficient inference of deterministic flow models. In contrast to both diffusion models and prior CNF training algorithms, CFM does not require the source distribution to be Gaussian or require evaluation of its density. A variant of our objective is optimal transport CFM (OT-CFM), which creates simpler flows that are more stable to train and lead to faster inference, as evaluated in our experiments. Furthermore, OT-CFM is the first method to compute dynamic OT in a simulation-free way. Training CNFs with CFM improves results on a variety of conditional and unconditional generation tasks, such as inferring single cell dynamics, unsupervised image translation, and Schr\"odinger bridge inference.Comment: A version of this paper appeared in the New Frontiers in Learning, Control, and Dynamical Systems workshop at ICML 2023. Title change from v1. Code: https://github.com/atong01/conditional-flow-matchin

    Investigating the contributions of circadian pathway and insomnia risk genes to autism and sleep disturbances

    Get PDF
    Sleep disturbance is prevalent in youth with Autism Spectrum Disorder (ASD). Researchers have posited that circadian dysfunction may contribute to sleep problems or exacerbate ASD symptomatology. However, there is limited genetic evidence of this. It is also unclear how insomnia risk genes identified through GWAS in general populations are related to ASD and common sleep problems like insomnia traits in ASD. We investigated the contribution of copy number variants (CNVs) encompassing circadian pathway genes and insomnia risk genes to ASD risk as well as sleep disturbances in children with ASD. We studied 5860 ASD probands and 2092 unaffected siblings from the Simons Simplex Collection (SSC) and MSSNG database, as well as 7509 individuals from two unselected populations (IMAGEN and Generation Scotland). Sleep duration and insomnia symptoms were parent reported for SSC probands. We identified 335 and 616 rare CNVs encompassing circadian and insomnia risk genes respectively. Deletions and duplications with circadian genes were overrepresented in ASD probands compared to siblings and unselected controls. For insomnia-risk genes, deletions (not duplications) were associated with ASD in both cohorts. Results remained significant after adjusting for cognitive ability. CNVs containing circadian pathway and insomnia risk genes showed a stronger association with ASD, compared to CNVs containing other genes. Circadian genes did not influence sleep duration or insomnia traits in ASD. Insomnia risk genes intolerant to haploinsufficiency increased risk for insomnia when duplicated. CNVs encompassing circadian and insomnia risk genes increase ASD liability with little to no observable impacts on sleep disturbances

    Genome wide analysis of gene dosage in 24,092 individuals estimates that 10,000 genes modulate cognitive ability

    Get PDF
    International audienceGenomic copy number variants (CNVs) are routinely identified and reported back to patients with neuropsychiatric disorders, but their quantitative effects on essential traits such as cognitive ability are poorly documented. We have recently shown that the effect size of deletions on cognitive ability can be statistically predicted using measures of intolerance to haploinsufficiency. However, the effect sizes of duplications remain unknown. It is also unknown if the effect of multigenic CNVs are driven by a few genes intolerant to haploinsufficiency or distributed across tolerant genes as well. Here, we identified all CNVs > 50 kilobases in 24,092 individuals from unselected and autism cohorts with assessments of general intelligence. Statistical models used measures of intolerance to haploinsufficiency of genes included in CNVs to predict their effect size on intelligence. Intolerant genes decrease general intelligence by 0.8 and 2.6 points of intelligence quotient when duplicated or deleted, respectively. Effect sizes showed no heterogeneity across cohorts. Validation analyses demonstrated that models could predict CNV effect sizes with 78% accuracy. Data on the inheritance of 27,766 CNVs showed that deletions and duplications with the same effect size on intelligence occur de novo at the same frequency. We estimated that around 10,000 intolerant and tolerant genes negatively affect intelligence when deleted, and less than 2% have large effect sizes. Genes encompassed in CNVs were not enriched in any GOterms but gene regulation and brain expression were GOterms overrepresented in the intolerant subgroup. Such pervasive effects on cognition may be related to emergent properties of the genome not restricted to a limited number of biological pathways

    Genome-wide association scan identifies new variants associated with a cognitive predictor of dyslexia

    Get PDF
    Developmental dyslexia (DD) is one of the most prevalent learning disorders, with high impact on school and psychosocial development and high comorbidity with conditions like attention-deficit hyperactivity disorder (ADHD), depression, and anxiety. DD is characterized by deficits in different cognitive skills, including word reading, spelling, rapid naming, and phonology. To investigate the genetic basis of DD, we conducted a genome-wide association study (GWAS) of these skills within one of the largest studies available, including nine cohorts of reading-impaired and typically developing children of European ancestry (N = 2562-3468). We observed a genome-wide significant effect (p <1 x 10(-8)) on rapid automatized naming of letters (RANlet) for variants on 18q12.2, within MIR924HG (micro-RNA 924 host gene; rs17663182 p = 4.73 x 10(-9)), and a suggestive association on 8q12.3 within NKAIN3 (encoding a cation transporter; rs16928927, p = 2.25 x 10(-8)). rs17663182 (18q12.2) also showed genome-wide significant multivariate associations with RAN measures (p = 1.15 x 10(-8)) and with all the cognitive traits tested (p = 3.07 x 10(-8)), suggesting (relational) pleiotropic effects of this variant. A polygenic risk score (PRS) analysis revealed significant genetic overlaps of some of the DD-related traits with educational attainment (EDUyears) and ADHD. Reading and spelling abilities were positively associated with EDUyears (p similar to [10(-5)-10(-7)]) and negatively associated with ADHD PRS (p similar to [10(-8)-10(-17)]). This corroborates a long-standing hypothesis on the partly shared genetic etiology of DD and ADHD, at the genome-wide level. Our findings suggest new candidate DD susceptibility genes and provide new insights into the genetics of dyslexia and its comorbities.Peer reviewe
    • 

    corecore